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Abstract — This paper presents a novel coding scheme for 
distributed storage systems containing nodes with adversarial 
errors. The key challenge in such systems is the propagation of 
the erroneous data from a single corrupted node to the rest of the 
system during node repair process. We present a concatenated 
coding scheme which is based on two types of codes: maximum 
rank distance (MRD) code as an outer code and optimal repair 
maximal distance separable (MDS) array code as an inner code. 
We prove that this coding scheme attains the upper bound on 
the resilience capacity, i.e., amount of data stored reliably in a 
system with a limited number of corrupted nodes. 

I. Introduction 

In light of exponential growth in the amount of data that 
is being generated, the issue of designing new storage mech- 
anisms to handle this vast amount of data has grown as one 
of the primary challenges. The surge in the number of papers 
in this area over the past decade is a manifestation of the 
importance of this problem. With ever increasing size of items 
that need to be stored (e.g., HD videos, large databases) and 
distributed nature of origin and access locations of most of the 
content that is being stored, having a single storage location 
for a data item is neither feasible nor desirable. Distributed 
storage systems (DSS) alleviate this problem by storing the 
content over a network of nodes. 

One of the main issues faced by DSS is resilience against 
node failures. If left uncoded, node failures may result in 
permanent loss of (a portion of) the data stored. Thus, coding 
is essential to instill resilience to node failures. Given the 
prevalence of single node failures in DSS (a user exiting a P2P 
system, power outage in a single data center in the cloud), a 
single node is repaired as soon as the node failure occurs in 
order to sustain the desired level of redundancy in the system. 
There are several reasons due to which instantaneous repair is 
desirable; one of which is to prevent permanent loss of data 
in the event of a catastrophic failure. In order to repair the 
failed node, data is downloaded from surviving nodes and a 
function of these is stored as the 'restored' node. The amount 
of data downloaded in this repair process is called the repair 
bandwidth. A naive strategy for node repair is to download 
all data from surviving nodes to enable regeneration of the 
failed node. However, such an approach leads to a large repair 
bandwidth and consumes a vast amount of system resources 
(in terms of bandwidth and energy). Therefore, it is desirable 
to have a repair scheme that has as small a repair bandwidth 
as possible. In (TJ, Dimakis et al. establish an information 
theoretic lower bound on repair bandwidth for MDS codes 
using min-cut analysis and show a trade-off between repair 
bandwidth and the amount of data stored on each node. 



Functional repair is the one where the original failed node 
may not be replicated exactly, but to another that is functionally 
equivalent. |2|] and [3] present storage schemes (i.e., codes) that 
achieve the lower bound on repair bandwidth. An alternative 
and rather desirable notion of repair is exact repair, where 
the regenerated data is an exact replica of what was stored 
on the failed node. The work in J4), 0, and J6) devise 
storage mechanisms, which under different restrictive settings 
(e.g., k < max(3,n,/2)) achieve the lower bound derived in 
Q]. Recently, this result has been extended to more general 
settings by various researchers. f7) presents codes for DSS with 
two parity nodes, which accomplish exact regeneration while 
being optimal in repair bandwidth. In and (9), permutation- 
matrix based codes are designed to achieve the bound on repair 
bandwidth for systematic nodes repair for all (n, k) pairs. iflOl 
further generalizes the idea of fjO to get MDS array codes 
for DSS that allow optimal exact regeneration for parity nodes 
as well. 

While a majority of the work in DSS literature addresses the 
storage versus repair bandwidth trade-off, another important 
issue that has recently received attention is the design of 
storage schemes that ensure security and reliability of the 
stored content against adversarial errors ifTTI . fl2l . ||T3l . It 
is the latter issue of inducing reliability against adversarial 
errors that we address in this paper. The dynamic nature of 
DSS due to node repair makes the issue of dealing with 
erroneous nodes non-trivial as a single corrupted node may 
subsequently corrupt a large portion of the DSS system by 
spreading the pollution during node repair. In fl3l . Pawar 
et al. address the reliability issue in detail and derive upper 
bounds on the amount of data that can be stored on the 
system and reliably made available to a data collector when 
optimal node repair is performed. The authors consider two 
models for adversarial errors introduced in the storage nodes: 
1) an omniscient adversary, who can observe all the nodes and 
knows the coding scheme employed by the system, 2) a limited 
knowledge adversary that can observe a maximum number of 
nodes throughout. In both error models, the adversary can 
control at most a fixed number of nodes and inject false 
information through these nodes during the entire operation 
of DSS. fill also propose coding strategies that achieve the 
upper bound in bandwidth limited regime. 

In this paper, we adopt the notion of omniscient adversary. 
As in fl3l . we assume an upper bound on the number of 
nodes that can be affected. This upper bound is a system 
parameter that is used to design coding schemes to ensure 
reliable delivery of the original data to an end user, i.e., data 



collector. In addition, we classify adversarial attacks into two 
classes: 

1) One-time errors: an omniscient adversary replaces the 
content of an affected node with nonsensical information 
only once. The affected node uses this same polluted 
information during all subsequent repair and data col- 
lection processes. 

2) Dynamic errors: an omniscient adversary may replace 
the content of an affected node each time the node 
is asked for the data during data collection or repair 
process. This kind of attack is more difficult to manage 
in comparison to one-time errors. 

We present a novel concatenated coding scheme for DSS 
which provides resilience against these two classes of attacks. 
The scheme attains the upper bound on the amount of data that 
can be stored reliably in DSS lfl3l . In our scheme, the content 
to be stored is first encoded using a maximum rank distance 
(MRD) code, and the output of this outer code is further 
encoded using an optimal repair maximum distance separable 
(MDS) array code. Using an MRD code, which is an optimal 
rank-metric code, allows us to quantify the errors introduced 
in the system using their rank as opposed to their Hamming 
weights. Due to the dynamic nature of the DSS a large number 
of nodes can get polluted even by a single erroneous node, 
since false information spreads as a result of node repairs. 
Thus, a single polluted node can infect many others, resulting 
in an error vector with a large Hamming weight. Using rank- 
metric codes alleviates this problem as the error that a data 
collector has to handle has a rank at most the size of the data 
in the polluted nodes, and can therefore be corrected by an 
MRD code with a sufficient rank distance. Using an (n, k) 
bandwidth efficient MDS array code as inner code facilitates 
bandwidth efficient node repair in the event of a single node 
failure and allows the data collector to recover the original 
data form any subset of k storage nodes. In this paper, we use 
exact-regenerating bandwidth efficient codes operating at the 
minimum-storage regenerating (MSR) point [1|. However, our 
construction works for any regenerating code. 

The proposed coding scheme is directly applicable to the 
one-time errors attack model. However, the model with dy- 
namic errors is more complicated, as it allows a single mali- 
cious node to change its pollution pattern, thus introducing an 
arbitrarily large extent of error both in Hamming weight and in 
rank. In this case, we combine our concatenated coding scheme 
with the standard hash function based approach in order to 
control the amount (rank) of pollution (error) introduced by 
an adversarial node. Note that the use of hash functions has 
previously been presented in the context of DSS to deal with 
errors in ifTTI . lfT3l . While promising, hash functions provide 
only probabilistic guarantees for pollution containment. 

The rest of the paper is organized as follows: In Section HI1 
we first give a brief description of rank-metric codes along 
with Gabidulin MRD codes and the error model in rank-metric 
for these codes. Subsequently, we describe MDS array codes 
and present two examples of bandwidth efficient MDS array 
codes that are later used as inner codes in our construction. In 
Section|IIIl we describe the construction of our storage scheme 
and prove its error resilience under the one-time error model. 
Further, we present a few examples to illustrate our scheme 



and prove that our codes attain the upper bound on resilience 
capacity. Finally, in Section HVl we briefly discuss the dynamic 
error model. 

II. Preliminaries 
A. Rank-Metric Codes 

Rank-metric codes were introduced by Delsarte lfl4l and 
rediscovered in ||l5l . lH6l . These codes have applications 
in different fields, such as space-time coding IfTTI . random 
network coding Iffifl . |fl9l , and public key cryptosy stems |20l . 
Our goal in this paper is to show that rank-metric codes are 
useful for error correction in distributed storage as well. 

Let ¥ q be the finite field of size q. For two N x m matrices 
A and B over ¥ q the rank distance is defined by 



d R (A, B) d =rank(A 



B) 



An [N x m, g, 5} rank-metric code C is a linear code, whose 
codewords are N x m matrices over F g ; they form a linear 
subspace with dimension g of ¥^ xm , and for each two distinct 
codewords A and B, dn(A, B) > 5. For an [N x m, g, 5} rank- 
metric code C we have g < mm{N(m — 5 + 1), m(N — 5 + 1)} 
03), 03), lfl6l . This bound, called Singleton bound for rank 
metric, is attained for all possible parameters. The codes that 
attain this bound are called maximum rank distance (MRD) 
codes. 

1) Gabidulin MRD Codes: An important family of MRD 
linear codes is presented by Gabidulin fl5l . These codes can 
be seen as the analogs of Reed-Solomon codes for rank metric. 
Let m < N. A codeword in an [N x m, g, S] rank-metric code 
C can be represented by a column vector c = [ci, C2, . . . , c m ] T , 
where Ci £ ¥ q N, since ¥ q N can be viewed as an A^-dimensional 
vector space over ¥ q . Let gi £ ¥ q N, 1 < i < m, be 
linearly independent over F g . The generator matrix Q of an 
[N x m, g, 5] Gabidulin MRD code is given by 



Q = 



91 92 ... 9m 

Jl] [1] [1] 

9i 92 ■ ■ ■ 9m 



9 [ K - i] 9 i r i] 



where K = m - 5 + 1, g = NK, and [i] = q l mod N . We will 
use this family of MRD codes in our construction of codes for 
distributed storage. 

2) Rank Error Correction: Let C C F^, be a Gabidulin 
MRD code with minimum distance 5. Let c £ C be the 
transmitted codeword and let r = c + e be the received word. 
The code C can correct any error e £ ¥™ l N of rank t as long 
as 2t < 6 — 1. Note, that since rank(e) = t, we can write 



[ui . . . u t ] 



ei 



where e, £ F„ 



are linearly independent over the base 
are linearly independent vectors of 
length m. Decoding algorithms for rank-metric codes are 
provided in iTBl. Ell. 
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Fig. 1: Illustration of the second node repair process in (5, 
erroneous information at the first storage node. 
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Zigzag code: (a) for error free system, (b) for system with 



B. MDS Array Codes for Distributed Storage 

A linear array code C of dimensions a x n over ¥ q is 
defined as a linear subspace of F" n . Its minimum distance d 
is defined as the minimum Hamming distance over ¥ qa , when 
we consider the codewords of C as vectors of length n over 
¥ q a . An array code C is called an (n, k) maximum distance 
separable (MDS) code if \C\ = q ak , where k = n-d+1 11221, 
11231 . Note, that an MRD code is also an MDS array code. 

Let x = [xi,X2, . . . ,Xfe] T £ ¥ < q lk be an information vector, 
Xj G is a block of size a, for all 1 < i < k. These k blocks 
are encoded into 11 encoded blocks y^ £ ¥ q , 1 < i < n, stored 
in n nodes of size a, in the following way: 

y = Gx, 

where y = [yi,y2, ■ • ■ ,yn] T and the generator matrix G is 
an n x k matrix of blocks of size a x a given by: 
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An array code C has an MDS property if any blocks submatrix 
of G of size k x k is of the full rank. In other words, if an 
(n, k) MDS code is used to store data in a system, and any set 
of n — k storage nodes fails, the original data can be recovered 
from the k surviving nodes. 

We say that an MDS code satisfies optimal repair property, 
if a single failed node can be repaired by downloading a/(n — 
k) elements from every surviving node HI. 

1 ) Examples of Optimal Repair MDS Array Codes: In the 
following, we present two examples of the optimal repair MDS 
array codes for DSS, which we will use further for illustration 
of our coding scheme. Due to space constraints, we only 
describe the MDS codes used in the examples. For general 
constructions, interested readers may refer to the respective 
papers that present these codes. 

Example 1: (5,3) Zigzag code |9j. This class of MDS array 
codes (§1 is based on permutation matrices. For the (5, 3) 
Zigzag code presented in Fig. Q] the first three nodes are 
systematic nodes which store the data [ci, C2, . . . , C\J\. The 
block generator matrix for this code is given by 



I I I 
I I A 2 
I I A 3 



where / and denote the identity matrix and all-zero matrix, 
respectively. Fig. [Ta] describes node repair process for (5,3) 
Zigzag code. When the second node fails, the newcomer 
node downloads the symbols from the shaded locations at the 
surviving nodes. 

Example 2: (5, 3) Hadamard Design codes Q. This class of 
MDS array codes employs interference alignment strategies in 
order to perform node repair. In the (5, 3) example presented in 
Fig. |2] the first three nodes are systematic nodes which store 
the data, y^ = [c(<_i) a +i, . . . , Cj a ], 1 < i < 3. The block 
generator matrix for a (5, 3) Hadamard design based code is 
given by 



I I A 5 ,i 
I I A 5 ] 2 
I I A 5 , 3 



where Aj t = a. t Xi + biX± + I, 1 < i < 3, Xi = I 2 ^-i <8> 

blkdiag (l-^, ~^-^)' a = anc * °* anc * ^» e ^1 sat isfy a 1 ~ 
= —1. The process of the second node repair is illustrated in 
Fig. [2a] During this process, the newcomer downloads V T y4 
and V T y$ from node 4 and 5, respectively (see D\ in Fig.|2al>. 
The newcomer uses V and V as repair matrices corresponding 
to node 1 and 3, respectively, where V T and V T are some basis 
for the row-space of [V 1 V] T and [V A^ 3 V] T , respectively. 
The information downloaded from node 1 and 3 is used to 
cancel the interference terms (contribution of y 4 and y3 in 
Di). After interference mitigation, a linear system of equations 
is solved to get y2. 

III. The Construction 

In this section we present our coding scheme and prove its 
error tolerance under the one-time error model. 

Let M £ ¥' q <N denote a file of size KN. M is partitioned 
into K parts of size N each. We form an N x K matrix M. over 
¥ q , where the ith part of M forms the ith column of At, for 
all 1 < i < K. Let C be an [N x to, q = NK, 6 = to, - K + 1] 
Gabidulin MRD code, with m < N. Let c M £ ¥ q n N be the 
codeword in C which corresponds to the information matrix 
M.. Let a, k be positive integers such that to = ak. Let C be 
an (n, k) MDS array code of dimensions a x n. We partition 
the vector cm into k parts of size a and form n nodes of 
size a each, according to the encoding algorithm of the code 
C. Note, that we use an MDS array code over ¥ q , i.e., its 



generator matrix is over ¥ q , and during the process of node 
repair, a set of surviving nodes transmits linear combinations 
of the stored elements with the coefficients from ¥ q . 

The following theorem shows that our system tolerates up 
to t erroneous nodes, if 2to + 1 < 6. 

Theorem 1: Let t be the number of erroneous nodes in the 
system based on concatenated MRD and optimal repair MDS 
array codes. If 2to + 1 < S, then the original data can be 
recovered from any k nodes. 

Proof: Let cm G F^, be the codeword in C which corre- 
sponds to the information matrix A4, and let [xi,X2, . . . , x^], 
X; G ^qN, be the partition of cm into k parts of size a each. 
Let [yi,y2, • ■ • ,y n ], y, G F^W be the encoded blocks stored 
in n nodes. 

Let S = i2, ■ ■ ■ , it} be the set of indices of the 
erroneous nodes. Hence the ijth node, ij G S, contains 
Yll =1 Ai^exJ + e^, where = [e{ , e 2 3 , . . . , ei\ T G F^„ 
denotes an adversarial error introduced by the ijth node. When 
the failed nodes are being repaired, the errors from adversarial 
nodes propagate to the repaired nodes. In particulary, an fth 
node, 1 < I < n, contains Ylj=i^,j x J + Sj=i ^7 e * J > 
where B l e 3 G F^ XQ represents the propagation of error e lj 
and depends on the specific choice of an MDS array code. 
Suppose a data collector contacts a subset D with k nodes and 
downloads Ylj=i ^i,i x J + Sj=i e lj from any ith node, 
i G D. If these k nodes are all systematic nodes, then we 
obtain [xi, X2, . . . , x^] T + Be, where B is the blocks matrix 
of size ka x to over ¥ q given by 



B 



Bl 1 B[ 2 



R'l 
n 2 



K 1 



B %2 



Bl 2 



13., 



and e = [(e ll ) T , (e l2 ) T , . . . , (e lf ) T ] T . Otherwise, we obtain 
[xi,x 2 ,...,x fe ] T + B'e, where the block matrix B' G ¥ q axta 
represents the coefficients of e obtained by the decoding of 
the code C. Since the rank of e over F g is at most to, and 
5 > 2ta + 1, the MRD code C can correct this error. ■ 

A. DSS Dynamics under One-Time Error Model 

Now we demonstrate that the rank of the error introduced 
by an adversary does not increase due to node repair dynamics 
under the one-time error model. Hence, a data collector can 
recover the correct original information using a decoder for an 
MRD code. We present the case where an adversary pollutes 
the information stored at a single storage node. It is important 
to note that in our construction, any optimal repair MDS array 
code from DSS literature can be used as the inner code. In 
this subsection, we illustrate the idea of our construction with 
the help of two examples drawn from two different classes 
of optimal repair MDS array codes for DSS, presented in 
Section IlLBTI 

Example 3: Let C be (5,3) Zigzag code from Exam- 
ple Q] Its first three systematic nodes store a codeword c = 
[ci, C2, . . . , ci2] T G from Gabidulin MRD code, which is 
obtained by encoding the original data. The content stored in 



ith systematic node is y, = [c(j_iw + i, . 



,a 



,] G W" N . Note 



attacks the first storage node and introduces erroneous infor- 
mation. The erroneous information at the first node can be 
modeled as yi + e = [c\, c 2 .c 3 , c^\ T + [ei, e 2 , e 3 , e 4 ] T . Now 
assume that the second node fails. The system is oblivious 
to the presence of pollution at the first node, and employs an 
exact regeneration strategy to reconstruct the second node. The 
reconstructed node downloads the symbols from the shaded 
locations at the surviving nodes, as described in Fig. [TbJ and 
solves a linear system of equations to obtain [05, cq, C7, cs] T + 
[— ei, — e2, — 2 _1 ei, — 2 _1 e2] T , where 2 -1 denotes the inverse 
element of 2 in W q . Now assume that a data collector accesses 
the first three nodes in an attempt to recover the original data. 
The data collector now has access to c = c + [I, _Bj, 0] T e, 
where 



B 2 = 
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(1) 



Note that c contains an error of rank at most four. Therefore, 
the original MRD codeword c and subsequently the original 
information can be recovered, using an MRD code with rank- 
distance at least nine. 

Example 4: Let C be a (5,3) Hadamard design based 
code, described in Example |2] Its first three nodes store 
y t = [c(i_ 1 ) a+ i,...,c iQ ,] G F^W, 1 < i < 3, where 
c = [ci, C2, . . . , C3 a ] T G F^" is a codeword belonging to a 
Gabidulin MRD code, which is obtained by encoding the origi- 
nal data. Suppose an adversary modifies the information stored 
at the first node to yi + e = [cx, . . . , c a ] T + [ex, ■ ■ ■ , e a ] T . 
When the second node fails, a newcomer, unaware of the 
presence of error at the first node, employ the interference 
alignment based strategy described in Example |2] and de- 
picted in Fig. 12 After interference mitigation, a linear system 
of equations is solved to obtain y2 + B^c Assuming that 
a data collector contacts the first three nodes, it receives 
[yfi yl\ y| , ] T +[^7 0] r e which contains an error of rank 
at most a. This allows the recovery of uncorrupted information 
using an MRD code of sufficient minimum rank-distance. 

B. Code Parameters 

The upper bound on the amount of data that can be stored 
reliably in the system with t < | corrupted nodes, called 
resilience capacity, was presented by Pawar et al. [13|. This 
bound is given by 



k 

E > 

i=2t+X 



i{(n - i)(3,a}, 



(2) 



that a = 4 in this example. Let us assume that an adversary 



where n is the number of nodes, k is the number of nodes 
sufficient for reconstruction of the source file, a is a storage 
capacity of each DSS node, and f3 is the amount of data 
downloaded from each of the n — 1 surviving nodes for 
the repair of a single failed node. The authors provided 
the explicit construction of the codes that attain this bound, 
for bandwidth-limited regime. However, this construction has 
practical limitations for large values of t since the decoding 
algorithm presented in lfl3l is exponential in t. The decoding 
of codewords in the construction presented in our paper is 
efficient since it is based on two efficient decoding algorithms: 
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Fig. 2: Illustration of node repair in (5,3) Hadamard design based codes: (a) in error free system, (b) in the presence of error 
at the first storage node. 



one, for an MDS array code, and two, for a Gabidulin code. 
Next, we show that our constructed codes attain bound (ffl and 
thus, are optimal. 

Let the parameters K, N, m, S, a, k, n be as described in our 
construction. Then in = K + S — 1 and m = ak. Let t be an 
integer such that 6 = 2to+l. Then ak = K+S—l = K+2at, 
and hence K = a(k — 2t). 

Now we compare this result with the bound (0. Let C be an 
MDS array code with optimal repair property. Then (3 = 
Therefore, we can rewrite bound (O as follows: 

k 

C(a,f3 = — ^— ) < V imn{(n-i)—^—,a} = a(k-2t). 
n — k ^-^ n — k 

i=2t+l 

Thus, our codes attain the bound 

IV. Dynamic Errors 
The coding scheme from the previous sections is not directly 
applicable to the dynamic errors model as even a single 
adversarial node may introduce heavy pollution with an ar- 
bitrarily large rank by producing different inconsistent outputs 
for different node repairs in which it participates. However, 
a timely detection of the inconsistency in the information, 
provided by a single node for node repair and data collection 
over time, can prevent the adversary from distributing too many 
errors through a single node, thereby bounding the rank of 
error. We leverage hash functions to detect inconsistencies and 
extend our scheme as follows. A strong cryptographic hash 
of the node content is stored in a verifier. A failed node is 
repaired using the data from other nodes without consulting the 
verifier. Note, that the repaired node may potentially contain 
erroneous data from the adversarial nodes. After the repair is 
complete, the restored node asserts the validity of the received 
information by sending the hash based signatures of the data 
used for the repair to the verifier. If any inconsistency is 
detected, the verifier initiates a forced repair of the nodes 
which produced the content with the inconsistent hash. Note 
that such a procedure (as studied in ifTTI ') can at best provide 
probabilistic guarantees. 
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